Fast vocabulary acquisition in an NMF-based self-learning vocal user interface
نویسندگان
چکیده
In command-and-control applications, a vocal user interface (VUI) is useful for handsfree control of various devices, especially or people with a physical disability. The spoken utterances are usually restricted to a predefined list of phrases or to a restricted rammar, and the acoustic models work well for normal speech. While some state-of-the-art methods allow for user adaptation of he predefined acoustic models and lexicons, we pursue a fully adaptive VUI by learning both vocabulary and acoustics directly rom interaction examples. A learning curve usually has a steep rise in the beginning and an asymptotic ceiling at the end. To imit tutoring time and to guarantee good performance in the long run, the word learning rate of the VUI should be fast and the earning curve should level off at a high accuracy. In order to deal with these performance indicators, we propose a multi-level UI architecture and we investigate the effectiveness of alternative processing schemes. In the low-level layer, we explore the use f MIDA features (Mutual Information Discrimination Analysis) against conventional MFCC features. In the mid-level layer, we nhance the acoustic representation by means of phone posteriorgrams and clustering procedures. In the high-level layer, we use the MF (Non-negative Matrix Factorization) procedure which has been demonstrated to be an effective approach for word learning. e evaluate and discuss the performance and the feasibility of our approach in a realistic experimental setting of the VUI-user earning context. 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license http://creativecommons.org/licenses/by-nc-nd/3.0/).
منابع مشابه
A Self Learning Vocal Interface for Speech-impaired Users
In this work we describe research aimed at developing an assistive vocal interface for users with a speech impairment. In contrast to existing approaches, the vocal interface is self-learning, which means it is maximally adapted to the end-user and can be used with any language, dialect, vocabulary and grammar. The paper describes the overall learning framework and the vocabulary acquisition te...
متن کاملLabel Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface
• ACORNS English corpus • Utterances consist of 1 to 4 keywords and filler words • Vocabulary of 50 keywords Experimental variables • Training set sizes: 100, 200, 500, 1000, 2000, 4000 , 9821 utterances • Label noise : 0, 10, 30, 50, 70 and 90 % of the utterances affected by label noise in the training set • Four types of label noise, (see box 4) Results Label noise robustness and learning spe...
متن کاملA Self-Learning Assistive Vocal Interface Based on Vocabulary Learning and Grammar Induction
This paper introduces research within the ALADIN project, which aims to develop an assistive vocal interface for people with a physical impairment. In contrast to existing approaches, the vocal interface is self-learning which means it can be used with any language, dialect, vocabulary and grammar. The paper describes the overall learning framework, and the two components that will provide voca...
متن کاملSelf-taught assistive vocal interfaces: an overview of the ALADIN project
This paper gives an overview of research within the ALADIN project, which aims to develop an assistive vocal interface for people with a physical impairment. In contrast to existing approaches, the vocal interface is trained by the end-user himself, which means it can be used with any vocabulary and grammar, and that it is maximally adapted to the — possibly dysarthric — speech of the user. Thi...
متن کاملTowards a Self-Learning Assistive Vocal Interface: Vocabulary and Grammar Learning
This paper introduces research within the ALADIN project, which aims to develop an assistive vocal interface for people with a physical impairment. In contrast to existing approaches, the vocal interface is self-learning, which means it can be used with any language, dialect, vocabulary and grammar. This paper describes the overall learning framework, and the two components that will provide vo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 28 شماره
صفحات -
تاریخ انتشار 2014